fix: speed up scheduler queue views#728
Conversation
Maintain fair-queue group counts and resource demand as tasks enter and leave the ready queue, so QueueView creation no longer scans every queued task in scheduler hot paths. Add regression coverage for queue accounting after discard/commit and for avoiding full queued-task value scans. Fixes #724 Signed-off-by: Eric W. Tramel <1223539+eric-tramel@users.noreply.github.com>
Review: PR #728 — fix: speed up scheduler queue viewsSummaryReplaces the O(N-queued-tasks) scan inside FindingsCorrectness
Tests
Style / conventions
Performance
VerdictApprove with minor optional follow-ups. This is a clean, well-targeted performance fix. The accounting is maintained symmetrically with the existing
|
Greptile SummaryThis PR replaces the O(n) full-queue scan in
|
| Filename | Overview |
|---|---|
| packages/data-designer-engine/src/data_designer/engine/dataset_builders/scheduling/queue.py | Core scheduler rewrite — incremental accounting counters added and maintained correctly across enqueue/discard/commit; view() and _first_valid_item simplified to use pre-built state; no logic errors found. |
| packages/data-designer-engine/tests/engine/dataset_builders/scheduling/test_queue.py | Three new tests cover idempotent-enqueue accounting, post-removal accounting, and a locked-items sentinel that enforces non-scanning of non-candidate tasks. |
Reviews (3): Last reviewed commit: "Merge branch 'main' into codex/fix-724-s..." | Re-trigger Greptile
Signed-off-by: Eric W. Tramel <1223539+eric-tramel@users.noreply.github.com>
johnnygreco
left a comment
There was a problem hiding this comment.
Review complete. I did not find any actionable issues.
I checked the fair-queue accounting changes around duplicate enqueue, discard, commit, lazy head purging, and QueueView demand snapshots. The incremental counters look consistent with the previous queue membership contract, and the added tests cover the important stale-accounting risks.
Verification run:
ruff checkon the changed filesruff format --checkon the changed filespytest packages/data-designer-engine/tests/engine/dataset_builders/scheduling/test_queue.py -q(10 passed)git diff --check origin/main...HEAD
johnnygreco
left a comment
There was a problem hiding this comment.
Approved. My review found no actionable issues or required updates.
📋 Summary
Fixes scheduler hot-path scaling from Issue #724 by maintaining fair-queue group counts and resource demand incrementally. This keeps
FairTaskQueue.view()from rebuilding queue summaries by scanning every queued task during dispatch and diagnostics.🔗 Related Issue
Fixes #724
🔄 Changes
QueueViewfrom maintained accounting plus first-candidate group heads.🧪 Testing
make check-all-fixuv run pytest packages/data-designer-engine/tests/engine/dataset_builders/scheduling/test_queue.py- 10 passedmake test-engine- 2,217 passedPerformance Demonstration
Same-machine benchmark loading
origin/mainand the fixed branch in one run:origin/mainmedianFairTaskQueue.view()over 8,192 queued tasks, 100 calls✅ Checklist